An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.
translated by 谷歌翻译
弱监督(WS)是一种有力的方法,可以构建标记的数据集,面对几乎没有标记的数据,用于培训监督模型。它用标签函数(LFS)表达的多个嘈杂但廉价标签的估计取代了手持标签数据。尽管它已成功地用于许多域中,但弱监督的应用程序范围受到构造具有复杂或高维特征的域的标记功能的困难。为了解决这个问题,少数方法提出了使用一小部分地面真实标签自动化LF设计过程的方法。在这项工作中,我们介绍了aettos-bench-101:在挑战WS设置中评估自动化WS(autows)技术的框架 - 以前难以或不可能应用传统的WS技术是一组不同的应用程序域。虽然AtoW是扩展WS应用程序范围的有希望的方向,但诸如零击基础模型之类的强大方法的出现揭示了需要了解介绍技术如何与现代零射击或几次学习者进行比较或合作。这为autows-bench-101的中心问题提供了信息:给定每个任务的初始集100个标签,我们询问从业者是否应使用autows方法生成其他标签或使用一些简单的基线,例如来自基础模型或监督学习。我们观察到,在许多情况下,如果启动方法要超越基础模型的信号,则有必要超越简单的几个基线,而autows bench-101可以促进该方向的未来研究。我们以详尽的介绍方法进行彻底消融研究。
translated by 谷歌翻译
基础模型为使用开箱即用的嵌入和一些标签示例构建模型提供了令人兴奋的新范式。但是,尚不清楚如何在没有标记数据的情况下最好地应用基础模型。一种潜在的方法是将基础模型与弱监督框架融合在一起,该框架使用弱标签来源(预训练的模型,启发式方法,人群工人)来构建伪标记。挑战是建立一个最能利用基础模型和弱来源中可用信号的组合。我们提出了Liger,这是一种使用基础模型嵌入来改善现有弱监督技术的两个关键要素的组合。首先,我们通过分区嵌入空间和每分部分的源精度来产生较弱的源质量估计。其次,我们通过扩展嵌入空间中的源票来提高源覆盖范围。尽管基础模型具有黑盒的性质,但我们证明了表征我们的方法如何提高性能的结果,并证明了通过在嵌入空间中标签分布的平滑度来表明升降尺度。在六个基准的NLP和视频任务上,Liger的表现优于香草弱监督,弱监督的KNN和适配器的弱监督和适配器的表现为11.8分,而传统手工标签监督的KNN和适配器则以7.2分的监督。
translated by 谷歌翻译
弱监督(WS)框架是一种绕过手工标记大型数据集的流行方式,用于培训数据饥饿的模型。这些方法综合了多种噪声,但更便宜地获得了对下游训练的一套高质量伪标签的标签。然而,合成技术特异于特定类型的标签,例如二元标记或序列,并且每种新标签类型需要手动设计新的合成算法。相反,我们提出了一种普遍的技术,它可以通过任何标签类型的弱监管,同时仍提供所需的性质,包括实际灵活性,计算效率和理论保证。我们将这种技术应用于以前不被WS框架解决的重要问题,包括学习在双曲线歧管中的排名,回归和学习。从理论上讲,我们的合成方法产生一致的估计,用于学习挑战但是指数家庭模型的重要概括。通过实验,我们验证了我们的框架,并在不同的环境中显示了基础的基准,包括真实的学习 - 排名和回归问题以及学习在双曲线歧管上。
translated by 谷歌翻译
大多数现有的神经体系结构搜索(NAS)基准和算法优先考虑了良好的任务,例如CIFAR或Imagenet上的图像分类。这使得在更多样化的领域的NAS方法的表现知之甚少。在本文中,我们提出了NAS-Bench-360,这是一套基准套件,用于评估超出建筑搜索传统研究的域的方法,并使用它来解决以下问题:最先进的NAS方法在多样化的任务?为了构建基准测试,我们策划了十个任务,这些任务涵盖了各种应用程序域,数据集大小,问题维度和学习目标。小心地选择每个任务与现代CNN的搜索方法互操作,同时可能与其原始开发领域相距遥远。为了加快NAS研究的成本,对于其中两个任务,我们发布了包括标准CNN搜索空间的15,625个体系结构的预定性能。在实验上,我们表明需要对NAS BENCH-360进行更强大的NAS评估,从而表明几种现代NAS程序在这十个任务中执行不一致,并且有许多灾难性差的结果。我们还展示了NAS Bench-360及其相关的预算结果将如何通过测试NAS文献中最近推广的一些假设来实现未来的科学发现。 NAS-Bench-360托管在https://nb360.ml.cmu.edu上。
translated by 谷歌翻译
Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compressed 3D content remain open problems and an area of research interest. Yet most of the efforts in the research area ignore the local geometrical structures between points representation. In this paper, we overcome this limitation by introducing a novel and efficient objective metric for Point Clouds Quality Assessment, by learning local intrinsic dependencies using Graph Neural Network (GNN). To evaluate the performance of our method, two well-known datasets have been used. The results demonstrate the effectiveness and reliability of our solution compared to state-of-the-art metrics.
translated by 谷歌翻译
Despite the recent success of multi-task learning and pre-finetuning for natural language understanding, few works have studied the effects of task families on abstractive text summarization. Task families are a form of task grouping during the pre-finetuning stage to learn common skills, such as reading comprehension. To close this gap, we analyze the influence of multi-task learning strategies using task families for the English abstractive text summarization task. We group tasks into one of three strategies, i.e., sequential, simultaneous, and continual multi-task learning, and evaluate trained models through two downstream tasks. We find that certain combinations of task families (e.g., advanced reading comprehension and natural language inference) positively impact downstream performance. Further, we find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization.
translated by 谷歌翻译
The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive transformers in generating machine-paraphrased plagiarism and their detection is still developing in the literature. This work explores T5 and GPT-3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves a 66% F1-score in detecting paraphrases.
translated by 谷歌翻译
Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood -- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated -- i.e. the Poincar\'e, log-Sobolev and isoperimetric constant -- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant -- even for simple families of distributions like exponential families with rich enough sufficient statistics -- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
translated by 谷歌翻译
自动图像分析中的不确定性定量在许多应用中高度满足。通常,分类或细分中的机器学习模型仅用于提供二进制答案。但是,量化模型的不确定性可能在主动学习或机器人类互动中起关键作用。当使用基于深度学习的模型时,不确定性量化尤其困难,这是许多成像应用中最新的。当前的不确定性量化方法在高维实际问题中不能很好地扩展。可扩展的解决方案通常依赖于具有不同随机种子的相同模型的推理或训练集合过程中的经典技术,以获得后验分布。在本文中,我们表明这些方法无法近似分类概率。相反,我们提出了一个可扩展和直观的框架来校准深度学习模型的合奏,以产生近似分类概率的不确定性定量测量。在看不见的测试数据上,我们证明了与标准方法进行比较时的校准,灵敏度(三种情况中的两种)以及精度。我们进一步激发了我们在积极学习中的方法的用法,创建了伪标签,以从未标记的图像和人机合作中学习。
translated by 谷歌翻译